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Chapter  1 


Historical  Note 


In  1978  [1],  a  construction  was  published  of  Improved  Iterated  Codes 
(IIC)  which  were  inspired  by  Elias  [2]  and  constructed  from  primitive  BCH 
codes  [3].  IIC  provided  one  of  the  few  examples  at  the  time  of  codes  which 
provide  both: 

lim  Pr(error)  =  0  (1) 

n— *OG 

and 

lim  R  >  0  (2) 

n— *oc 

where  R  is  the  code  rate  in  information  bits  per  transmitted  symbol.  Codes 
with  both  properties  have  been  euphemistically  labeled  “good"  codes. 

Basically,  the  IIC  consists  of  a  row  code  and  a  set  of  column  codes,  one 
column  for  each  information  position  in  the  row  code.  Each  column  code  is 
an  iteration  of  additional  BCH  codes  with  an  Elias  code.  The  IIC  is  best 
described  by  its  decoding  algorithm: 

1.  Decode  the  row  code.  Compute  the  resulting  bit  error  probability. 
Set  ?  =  1. 

2.  Decode  the  ith  column  code.  The  error  probability  has  now  been 
reduced  to  an  arbitrarily  small  value,  and  the  first  information  position  in 
the  row  code  is  assumed  correct.  Subtract  that  information  position  from  the 
row  code,  thus  “shortening”  the  row  code  (see  Peterson  [3]),t.e.,  reducing 
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by  one  the  number  of  degrees  of  freedom  or  bits  of  information  that  the 
remaining  part  of  the  codeword  represents. 


3.  Decode  the  shortened  row  code.  The  resulting  bit  error  probability 
is  no  greater  than  it  was  following  Step  1,  and  it  may  be  lower  because 
the  minimum  distance  of  the  code  is  unchanged  while  the  block  length  is 
smaller. 

4.  Increment  i  and  go  to  Step  2. 

On  the  binary  symmetric  channel,  the  ratio  of  channel  capacity  to  rate 
for  p  =  0.1  was  found  to  be  approximately  2.4  [1].  However,  these  iterated 
codes  are  extremely  long  and,  therefore,  mostly  of  theoretical  interest.  If, 
however,  the  decoding  of  the  row  code  can  be  made  to  produce  a  lower  error 
probability,  then  the  columns  can  be  constructed  from  fewer  constituent 
codes.  The  resulting  iterated  code  will  be  shorter  and  may  have  a  larger 
rate.1 

To  this  end,  at  a  NATO  Advanced  Study  Institute  devoted  to  commu¬ 
nications  issues,  Farrell  [4]  suggested  that  the  rates  of  these  codes  could 
be  improved  by  incorporating  soft  decision  techniques  into  the  decoder.  In 
particular,  soft  decision  decoding  of  the  row  code  promises  to  give  better 
estimates  of  the  decoding  error  probability  at  the  first  step  and,  perhaps,  to 
permit  a  shorter  column  code  of  higher  rate  to  be  iterated  with  it  to  achieve 
the  same,  arbitrarily  low  error  probability  as  yielded  by  the  original  IIC  but 
at  a  larger  value  of  code  rate. 

In  what  follows,  soft  decision  decoding  is  explained,  and  techniques  from 
the  literature  are  studied  with  an  eye  to  selecting  those  which  may  provide 
lower  error  probabilities ’when  decoding  the  rows  of  an  IIC. 


i 

1 
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’The  rate  of  an  interated  code  is  the  product  of  the  rates  of  its  constituents.  Each  of 
the  consituent  rates  is  a  positive  number  less  than  unity. 
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Chapter  2 


Background 


Traditionally,  coding  and  decoding  for  error  control  are  designed  for 
discrete  channels  which  communicate  symbols  from  finite  sets  only.  Usually, 
such  a  finite  set  is  a  finite  field  [5]  or  an  algebraic  extension  of  a  finite 
field.  Channel  noise  is  represented  by  the  probabilistic  transition  of  one 
transmitted  symbol  to  another  received  symbol. 

This  discrete  approach  is  attractive  because  it  permits  strong  focus  on 
the  design  and  implementation  of  good  codes  and  efficient  decoders  using 
powerful  tools  from  such  disciplines  as  algebra,  combinatorics,  and  digital 
design.  Also  this  approach  is  useful  because  many  continuous  channels  can 
be  modeled  as  discrete  channels  by  incorporating  the  modulator,  demod¬ 
ulator,  and  threshold  device  into  the  channel  and  by  modeling  the  effects 
of  noise  as  the  probability  of  an  output  symbol  conditioned  on  the  sym¬ 
bol  transmitted.  The  most  common  example  of  such  a  channel  model  is  the 
discrete  memoryless  channel  (6]  which  has,  in  addition  to  the  foregoing  char¬ 
acteristics,  the  property  that  successive  symbol  transitions  are  statistically 
independent. 

However,  the  use  of  this  abstraction  is  not  without  its  penalties.  While 
the  channel  input  symbols  are  discrete  and  mutually  distinct,  the  channel 
noise  is  a  continuous  waveform  which  is  combined  (usually  added)  with  the 
input  signal  so  that  the  channel  output  is  always  a  continuous  function  of 
time.  A  discrete  channel  output  representation  is  realized  by  quantizing  the 
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continuous  output  into  two  or  more  levels.  In  the  binary  case,  whenever 
the  channel  output  exceeds  a  preset  threshold,  the  output  is  said  to  be  “1.” 
Whenever  it  does  not,  the  output  value  is  taken  to  be  “0.”  Thus,  the  output 
signal  is  1  regardless  of  whether  it  just  barely  exceeds  the  threshold  or  ex¬ 
ceeds  the  threshold  by  a  large  value.  For  many  noise  waveforms,  the  value 
of  the  output  symbol  is  more  uncertain  as  the  channel  output  approaches 
the  threshold  that  separates  1  from  0.  If  the  error  control  decoder  has  in¬ 
formation  about  the  relative  likelihoods  (probabilities)  of  received  symbol 
errors,  it  can  in  many  cases  correct  more  errors  in  that  word  than  if  all  sym¬ 
bols  were  assumed  equally  likely  to  be  correct.  That  is,  all  output  symbols 
having  the  same  value  do  not  necessarily  inspire  equal  “confidence”  in  their 
values.  It  has  been  showm  [7]  that  without  this  useful  likelihood  information 
(actually,  it  is  channel  state  information)  approximately  2.0  dB  more  trans¬ 
mitter  power  is  needed  to  achieve  same  the  decoded  error  probability  that 
can  be  achieved  with  that  information. 

Decoding  techniques  which  use  estimates  of  the  actual  values  of  the 
channel  output  waveform  are  called  soft  decision  decoding.  Those  which  use 
estimates  of  only  the  discrete  transmitted  symbols  are  called  hard  decision 
decoding. 

What  follows  is  an  introduction  and  survey  of  the  development  of  soft 
decision  techniques.  First,  we  examine  decoders  which  erase  (with  mean¬ 
ing  to  be  made  more  precise)  symbols  having  low  measures  of  likelihood, 
substituting  various  combinations  of  permitted  symbol  values  until  prede¬ 
termined  decoding  criteria  are  met.  From  these  we  move  to  decoders  which 
make  more  direct  use  of  the  channel  output  values  and  finally  to  techniques 
built  upon  the  examination  of  each  received  symbol. 

To  provide  focus  and  continuity,  we  deal  with  block  codes  only.  This 
does  not  imply  judgement  on  the  merits  of  convolutional  codes  and  Viterbi 
decoding  [8]  but  tries  to  examine  thoroughly  one  aspect  of  soft  decision 
decoding.  An  examination  of  the  References  reveals  at  least  a  theoretical 
interest  in  soft  decision  decoding  techniques  from  the  earliest  days  of  error 
control  coding  theory.  Like  so  many  things,  soft  decision  is  experiencing  a 
resurgence  of  interest  that  may  be  due,  in  part,  to  the  escalating  capabilities 
afforded  by  semiconductor  technology,  making  possible  today  techniques 
that  once  were  only  theoretical  ideas. 
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A  standard  set  of  notation  has  been  attempted  in  this  report.  The  follow¬ 
ing  symbols  retain  their  respective  meanings  throughout.  Where  additional 
entities  are  needed,  they  are  defined  locally. 


[aj  =  the  largest  integer  <  a 
C  =  a  code  taking  symbols  from  {±1} 
c  =  a  member  of  C 
Cm  —  code  C  with  symbols  from  {0, 1} 
c"  =  a  member  of  C" 

d  =  the  minimum  Hamming  distance  of  C  [3] 
d£  =  Euclidean  distance 
n  =  length  (number  of  symbols)  of  a  codeword 
r  =  a  vector  or  word  (A'-tuple  received  from  the  channel 
i  -  [^-J ,  the  guaranteed  error  correction  capability  of  a  block  code 

x  =  the  received  n-tuple  after  hard  decision  processing 
q  =  a  vector  of  confidence  values 

=  the  vector  of  bit  log  likelihood  ratios. 


O 
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Chapter  3 

i 

Simple  Erasure  Schemes 


When  a  codeword  is  transmitted  through  a  noisy  channel,  the  set  of 
symbols  or  waveform  values  produced  at  the  receiver  output  is  called  the 
received  word  or  received  vector. 

An  erasure  is  a  symbol  of  unknown  value  at  a  known  location  within 
the  received  word.  By  contrast,  an  error  is  a  symbol  of  unknown  value  at 
an  unknown  location.  Thus,  the  decoder  knows  the  number  of  erasures  and 
their  locations  but  knows  nothing  of  the  number  or  locations  of  errors. 


3.1  Forcing  Erasures 


Almost  from  its  beginning,  algebraic  coding  theory  provided  methods  for 
computing  the  values  of  symbols  that  were  erased  by  a  noisy  binary  channel 
provided  the  number  of  such  erasures  is  bounded  [3].  A  principal  advantage 
of  introducing  erasures  is  that  many  algebraic  decoding  algorithms  can  be 
modified  to  handle  erasures  efficiently  [9].  These  techniques  simply  erase  one 
or  more  of  the  received  symbols  and  apply  erasure  correction  techniques  [10]. 
Since  a  block  code  with  minimum  distance  d  can  correct  up  to  t  =  [(d- 1  )/2j 
errors  or  d  —  1  erasures  [3],  one  would  like  to  deal  with  erased  symbols 
in  known  locations  rather  than  with  errors  in  unknown  locations.  If  an 
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appropriate  measure  of  confidence  or  reliability  is  chosen  (e.g.,  a  monotone 
function  of  the  signal  to  noise  ratio  at  the  receiver  output),  then  the  received 
symbols  with  lowest  reliability  are  most  likely1  to  occupy  the  error  locations 
and  can  be  considered  erased. 

In  this  section  we  consider  soft  decision  decoders  that  use  both  the  output 
of  a  hard  decision  demodulator  and  a  set  of  reliability  estimates  produced 
by  the  receiver.  Typically,  one  or  more  of  the  lowest  reliability  positions 
in  a  received  word  are  erased.  The  decoder  then  tries  various  patterns  of 
channel  input  symbols  in  the  erased  locations  to  determine  if  a  hard  decision 
decoder  can  produce  one  or  more  codewords  from  this  “trial  vector.”  Rules 
are  given  to  choose  among  the  multiple  codewords  which  may  arise. 


3.2  Wagner  Decoding 


Wagner  Decoding2  is  applied  to  codes  constructed  by  appending  a  single 
parity  check  to  a  block  of  m  information  digits.  Based  upon  the  received 
vector  r  having  real  components,  the  receiver  computes  p(l|r)  and  p(0|r)  and 
makes  a  hard  decision  on  each  symbol  based  on  which  of  these  probabilities 
is  larger.  (The  decoder  is  assumed  to  know  the  channel  noise  probability 
distribution.)  If  parity  checks,  the  hard  decision  block  is  accepted  as  it 
is.  If  parity  fails,  the  position  with  the  lowest  value  of  A p  =  p(l|r)  - 
p(0|r)  is  inverted  to  force  parity  to  check.  This  is  perhaps  the  simplest 
example  of  an  “erase  and  substitute”  technique.  Analysis  [10]  shows  that,  for 
moderately  noisy  channels  having  values  of  bit  error  probability  around  0.01. 
Wagner  decoding  produces  lower  values  of  word  error  probability  than  does 
the  Hamming  code.  For  example,  for  p  =  0.01,  the  Wagner  decoder  produces 
a  word  error  rate  of  0.001  while  a  Hamming  code  vields  approximately  0.003 
[10]  . 

Subsequently,  Balser  and  Silverman  [11]  introduced  multiple  error  cor¬ 
rection  to  this  scheme  uy  adding  additional  check  digits  to  the  transmitted 

‘At  the  channel  output,  the  symbol  error  probability  is  a  monotone  decreasing  function 
of  the  signal  to  noise  ratio.  See,  for  example,  [7], 

JWagner  decoding  was  named  by  Balser  and  Silverman  [10]  for  C.A.  Wagner  of  MIT 
who,  in  1954,  “suggested  the  basic  idea.” 


block  and,  if  a  double  error  is  detected,  inverting  the  two  least  reliable  digits 
in  the  received  word.  In  fact,  they  used  the  Hamming  code  [12]  to  provide 
the  additional  parity  check  structure  since  it  has  minimum  distance  of  four 
and  can,  therefore  correct  one  error  per  block  or  detect  the  presence  of  two 
(or  any  even  number). 

The  original  Wagner  algorithm  is  found  in  the  final  Step  of  Chase’s  rank 
decoding  algorithm  [13]  which  is  discussed  later. 

3.3  Forced- Erasure  Decoding 


One  difficulty  with  applying  erasure  reconstruction  techniques  is  the  de¬ 
coder’s  lack  of  knowing  exactly  how  many  positions  should  be  erased — that 
is,  the  likely  existence  of  undetected  errors  in  the  unerased  positions  [14]. 
With  erasure  reconstruction  only  (no  error  correction),  there  are  no  guaran¬ 
tees  that  all  symbols  containing  errors  will  be  erased,  even  when  the  total 
number  of  errors  is  [^-J  or  fewer.  In  forced  erasure  decoding  (FED),  the 
received  symbols  within  a  word  are  ordered  according  by  increasing  value  of 
“confidence.”  Confidence  is  proportional  to  the  a  posteriori  probability  of  a 
transmitted  symbol  value  conditioned  upon  the  received  symbol.  If  the  n 
symbols  in  a  block  have  confidence  values  {qj,q2,  ...,q„},  reordering  gives 
Qn  -  Qj2  <  •  •  •  <  oJn.  Reconstruction  of  erased  symbols  is  performed  by 
trying  a  set  of  allowed  symbol  values  in  the  erased  position  and  testing  the 
resulting  vector  for  code  membership.  The  algorithm  is: 

1.  Set  i  =  1. 

2.  Erase  the  first  i  symbols  ordered  by  increasing  confidence.  Attempt 
reconstruction. 

3.  If  a  resulting  vector  belongs  to  the  code,  accept  it  as  the  decoded 
word  and  stop.3  If  i  >  2t,  there  may  be  more  than  one  reconstruction  which 
produces  a  codeword.  Depending  upon  the  type  of  channel  noise  assumed, 
a  rational  basis  for  choosing  among  these  is  needed.  Choosing  the  solution 

3Reason:  If  the  received  vector  is  within  d  of  a  codeword,  it  will  be  within  d  of  exactly 
one  codeword.  Therefore,  when  the  first  codeword  is  fou.id,  it  is  unnecessary  to  look 
further. 


which  represents  a  minimum  weight  error  pattern  is  intuitively  useful  for 
the  Gaussian  channel.  No  choice  can  be  shown  to  be  foolproof,  however. 

4.  If  no  reconstructed  codeword  is  found,  increment  t  by  1.  If  t  <  n  -  k 
go  to  2.  If  i  >  n  —  k,  stop,  and  declare  a  decoding  failure. 


3.4  Channel  Measurement  Decoding 


The  foregoing  methods  of  forcing  erasures  and  substituting  known  pat¬ 
terns  of  channel  input  symbols  assume  that  finding  the  codeword  at  mini¬ 
mum  Hamming  distance4  from  the  initial  hard-decision  vector  produces  the 
codeword  most  likely  transmitted.  However,  this  need  not  follow.  Suppose 
codeword  c<  is  transmitted  and,  after  hard  decision,  vector  x  is  received.  A 
binary  decoder  can  produce  codeword  c„  from  x,  by  the  criterion  of  mini¬ 
mum  Hamming  distance  decoding.  On  the  other  hand,  minimum  Euclidean 
distance  decoding  prior  to  hard  decision  might  produce  codeword  ^  ca. 

Why  is  this?  Some  of  the  positions  that  were  changed  by  the  minimum 
d  decoder  may  have  been  more  reliable  than  some  which  were  not.  For 
example,  if  the  first  three  positions  of  vector  r  =  (rj,  r2,  r3,  r4,  r$)  have  high 
values  of  some  reliability  measure  while  the  last  two  have  low  values,  then 
the  most  likely  transmitted  codeword  might  be  one  that  would  agree  with  r 
in  the  first  three  positions.  The  decoder,  then,  selects  values  for  the  last  two 
positions  that  produce  a  waveform  closest  to  that  received.  To  minimize 
this  ‘'analog  distance”  between  waveforms,  a  set  of  real  reliability  numbers, 
as  described  below,  is  used  to  define  an  analog  or  real  measure  of  distance 
in  the  decoding  procedure.  A  consequence  is  that  a  different  set  of  positions 
may  be  “corrected”  than  when  Hamming  distance  is  minimized. 

channel  HDD 
c.  — >—  r  — — •  x 

1  min  ds 
c6 

So,  if  there  is  a  codeword  c0  differing  from  the  binary  received  vector  x  in 

4The  Hamming  distance  between  two  vectors  is  simply  the  count  of  the  number  of 
places  in  which  they  differ. 


min  d 
— ca 
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2 t  or  fewer  positions,  a  binary  decoder  will  identify  it.  We  often  say  that  the 
binary  decoder  determines  the  error  pattern  e  having  minimum  Hamming 
weight  such  that  e  =  x©c„  where  ©  represents  addition  modulo  2.  Channel 
Measurement  Decoding  (CMD)  [15]  expands  upon  this  by  trying  to  find  a 
small  set  of  likely  error  patterns  and  choosing  from  this  set  the  one  having 
the  smallest  “analog  weight.”  CMD  uses  a  set  of  real  numbers  {o,, i  = 
1,2, •••,«}  to  represent  the  channel  state.  They  have  the  property  that 
a,  >  o j  =>  2,  is  more  correct  than  Zj  and  are  called  channel  measurement 
information.  To  use  this, 

•  From  the  received  binary  vector  x,  obtain  another  vector  x’  by  invert¬ 
ing  some  of  the  digits. 

•  Consider  x'  as  a  received  vector  and  use  a  binary  decoder  which  im¬ 
plements  bounded  distance  decoding5  to  try  and  find  its  error  pattern 
e  . 

•  For  all  the  new  vectors  {x  }  obtained  by  inverting  selected  sets  of  dig¬ 
it',  find  the  corresponding  error  patterns  e  and  select  the  one  having 
minimum  analog  weight  wa{e)  given  below. 


Wa(ei')  =  (3) 

;=i 

As  the  channel  signal  to  noise  ratio  increases,  the  performance  of  CMD 
can  approach  that  of  maximum  likelihood  decoding. 

It  is  desirable  to  select  the  digits  to  be  inverted  in  CMD  so  that  the 
number  of  error  patterns  e  is  small  and  so  that  the  vectors  x  lie  “near”  the 
received  word.  Chase  [15]  offered  several  algorithms  for  selecting  the  sets  of 
digits  to  be  inverted  in  CMD  to  meet  these  criteria. 


a 


b  A  bounded  distance  decoder  can  correct  no  more  than  t  <  [(rf  -  1)/2J  errors,  irre¬ 
spective  of  the  actual  capability  of  the  code. 
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Chapter  4 


Soft  Decision  Decoding  — 
The  Paradigm 


A  hard  decision  decoder  measures  the  distance  between  the  received  word 
and  various  candidate  code  words  by  using  Hamming  distance  (See  Chapter 
3.)-  This  chapter  discusses  decoding  techniques  which  use  estimates  of  the 
real  value  of  each  received  symbol  in  order  to  establish  relative  confidence  in 
the  decoded  symbols.  This  permits  one  to  measure  distances  between  words 
in  terms  of  real  values  rather  than  counts  of  differences. 


4.1  Errors  and  Erasures  Decoding 


Let  cm  be  a  codeword  with  symbols  from  {±1}.  If  x  is  the  received 
vector  after  hard  decision  processing  (but  not  decoding)  and  also  has  values 
from  {±1},  then  “conditional”  Hamming  distance  decoding  (decode  only 
when  there  is  a  codeword  within  minimum  distance  of  the  received,  hard 
decision  word)  is  permitted  by  this  theorem:  1 


! 

j 
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'Proofs  of  the  theorems  in  this  section  and  the  next  can  be  found  in  [16]  and  will  not 
be  repeated  here. 
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Theorem  1  There  exists  at  most  one  codeword  cm  such  that 

x •  c m>  n-d  (4) 

where  the  left  side  is  computed  as  the  scalar  product  of  two  real  vectors. 


This  theorem,  and  others  in  this  chapter  having  a  similar  form  can  be 
thought  of  as  generalizations  of  this  well-known  result  for  hard  decision 
decoding  and  binary  codes: 


There  is  at  most  one  codeword  c  at  Hamming  distance  d  from 
the  received  binary  vector  x  [3]. 


In  these  generalizations,  the  minimum  distance  of  the  code  is  “shared”  in 
a  discrete  or  “continuous”  fashion  (depending  upon  the  nature  of  the  re¬ 
ceived  vector  under  consideration)  over  the  values  of  the  received  symbols 
as  necessary  to  achieve  minimum  <*£  decoding. 

As  indicated  earlier,  a  natural  generalization  of  Wagner  decoding  is  to 
erase  several  unreliable  received  symbols  having  reliability  values  less  than 
a  preset  value  [17]. 

In  fact,  one  can  define  a  null  zone  of  amplitudes  of  the  received  waveform 
which  represent  the  most  unreliable  values  because  they  are  closest  to  the 
decision  threshold  between  1  and  0.  Let  r  =  (rj ,  r2,  •  •  • ,  rn)  be  a  received 
vector  having  real,  continuous  components  taking  values  on  the  closed  in¬ 
terval  [-1. 1],  Define 


+1 

if  r,  >  T 

0 

if  -T  <  r{  <T 

(5) 

-1 

if  r,  <  -T. 

Values  falling  within  the  null  zone  defined  by  —T  <  r,  <  T  are  erased,  thus 
presenting  to  the  decoder  a  word  constructed  from  an  alphabet  of  three 
symbols,  (  —  1,0, 1}  and  possibly  containing  erasures  in  known  positions  and 
errors  in  unknown  positions.  This  suggests  errors  and  erasures  decoding 
(EED),  which  is  based  on  the  following  theorem  [16]: 
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Theorem  2  There  is  at  most  one  code  word  cm  from  a  code  of  length  n 
and  minimum  distance  d  such  that 

r  •  cm  -  n  -  2tm  -  s  >n-d  (6) 

where  tm  and  s  are  the  numbers  of  errors  and  erasures,  respectively,  that  the 
code  is  guaranteed  to  correct. 


This  exploits  the  fact  [3]  that  a  block  code  with  minimum  distance  d  can 
correct  any  pattern  of  t  errors  and  s  erasures  so  long  as  t  and  s  satisfy 

2 1  +  s  <  d.  (7) 

A  three  step  procedure  can  be  used  for  EED  of  a  binary  code  when 
decoding  for  s  erasures  and  an  unknown  number  of  errors  in  the  received 
word.  Let  t  be  the  number  of  errors  per  word  guaranteed  correctable  by  the 
minimum  distance  of  the  code: 


d  -  2t  +  l. 


Then: 


(8) 


•  Set  all  s  erased  bits  to  0  and  allow  the  decoder  to  correct  up  to  t  errors. 

•  Set  all  s  erased  bits  to  1  and  decode  up  to  t  errors. 

•  If  each  of  these  produces  a  codeword,  the  decoder  accepts  the  one 
which  would  have  experienced  the  smaller  number  of  errors  in  trans¬ 
mission.  In  this  case,  it  has  been  shown  [17]  that  the  decoding  which 
requires  the  fewer  number  of  changes  produces  the  actual  transmitted 
word. 


4.2  Generalized  Minimum  Distance  Decoding 


Generalized  Minimum  Distance  (GMD)  decoding  was  developed  to  pro¬ 
vide  a  systematic  way  of  minimizing  the  Euclidean  distance  between  the 
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received  vector  and  the  code  rather  than  by  making  hard  decisions  and 
minimizing  the  Hamming  distance. 


The  receiver  is  assumed  to  produce  a  vector  r  =  (rj,  •••,rn)  such  that 
-1  <  rj  <  1,  j  =  l,2,...,n.  The  Euclidean  distance  between  the  received 
vector  and  the  mth  codeword  is 


</£2(r,cm) 


I  r  -  cm  |2 

|  r  |2  +  |  cm  |2  -2-r-cm. 


(9) 


Formally,  the  problem  is  to  minimize  d.£  over  the  message  number  m. 
This  is  achieved  by  maximizing  r  •  cm  in  the  foregoing. 

We  have  this  important  theorem  for  generalized  minimum  distance  de¬ 
coding  [16]: 


Theorem  3  There  exists  at  most  one  cm  such  that 

r  •  cm  >  n  -  d.  (10) 


Any  decoding  which  meets  this  criterion  is  called  generalized  minimum 
distance  (GMD)  decoding.2 


To  use  GMD  decoding,  order  the  indices  on  the  received  symbols  so  that: 


Then  define: 


and 


1  rM  1  <  | 

r-2  1  <  •  •  •  <1  r«„|- 

(11) 

s(r,)  = 

f  +1  if  fj  >  0 
\  -1  if  rj  <  0 

(12) 

l<;<fc,  0<Jt<n 
),  k  +  1  <  j  <  n. 

(13) 

JWe  emphasize  that,  in  this  Theorem,  the  components  of  r  take  continuous  values 
whereas,  in  Theorem  1,  the  received  vector  takes  values  from  { ±  1 } . 


That  is,  q*  =  (q*(rx),g;t(r2),...,g*(rn))  is  zero  in  the  k  weakest  components 
of  r  and  has  values  from  {±1}  in  the  remaining  n  -  k  positions. 

To  adapt  EED  to  GMD  decoding,  we  need  the  next  theorem. 

Theorem  4  If  t  ■  cm  >  n  -  d,  then  for  some  k: 

qk  •  Tm  >  n  -  d.  (14) 


The  point  of  Theorem  4  is  to  permit  the  use  of  any  errors  and  erasures 
decoder  on  q*  according  to  the  following  algorithm. 


•  Order  the  indices  of  the  components  of  the  received  vector  as  in  (11). 

•  Set  i  =  0. 

•  From  the  received  word,  r,  determine  q,  as  above. 

•  If  an  errors  and  erasures  decoder  can  find  a  codeword  cm  correspond¬ 
ing  to  q,,  then  decode  r  as  cm  and  exit. 

•  Else  i  =  i  +  1.  If  t  <  d,  repeat  previous  step.  If  i  >  d  quit. 

The  foregoing  illustrates  but  one  use  of  GMD  decoding.  Beyond  this,  many 
investigators  have  extended  it  to  other  applications.  Examples  include  de¬ 
coding  on  Q-arv  output  channels  [18],  burst  error  decoding  on  Q-ary  output 
channels  [19],  [20],  and  majority  logic  decoding  [21].  A  few  of  these,  impor¬ 
tant  to  the  application  cited  in  the  first  chapter,  will  now  be  discussed. 


4.3  Decoding  the  Channel  with  Quantized  Out¬ 
put 


As  earlier  seen,  errors  and  erasures  decoding  provides  an  elementary 
quantization  of  the  continuous  channel.  Going  beyond  the  definition  of  a 
simple  null  zone  centered  on  the  1  / 0  threshold,  investigators  [22,23]  sought  to 
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determine  how  far  from  the  threshold  the  received  signal  amplitude  lay.  This 
would  provide  a  measure  of  confidence  in  the  initial  hard  decision  estimate. 

In  this  paradigm,  the  continuous  channel  output  is  quantized  into  Q 
levels  (Lo,  L\,  -  ■  • ,  Lq-i)  where  Q  is  typically,  but  not  necessarily,  a  power 
of  two.  With  each  level  is  associated  a  u>-weight  tu,  which  represents  the 
distance  from  the  threshold  to  X,  or  to  Lq- j_,.  By  assigning  u>o  =  0,  = 

1,  and  defining  the  te-distance  between  two  levels  as 

dw(L,,  Lj)  =|  Wi  —  u’j  |  (15) 

we  can  define  the  u'-distance  between  any  two  C?-ary  n-tuples  as 

n 

,  Xj  )  —  ^  ^  dw(xtp,  Ijp)-  (16) 

P=1 

u’-distance  has  been  shown  to  be  a  true  metric  [23]. 

Minimum  u’-distance  decoding  of  code  C  becomes  simply:  given  received 
Q  ary  vector  r,  find  the  codeword  that  produces: 

mindul^c,),  c,  6  C  (17) 

t 

Then,  the  u-weight  e,  of  the  error  in  the  sth  digit  is  the  u’-distance  between 
the  sth  transmitted  and  received  symbols,  and 


Theorem  5  yin  (n,k)  code  with  minimum  Hamming  distance  d  can  correct 
any  error  pattern  with  w-weight  satisfying 


Z 


e. 


d 

2' 


(18) 


This  is  proved  in  [23].  In  the  absence  of  a  systematic  decoding  algo¬ 
rithm,  one  could  simply  compute  the  u’-distance  from  the  received  vector  to 
each  code  word  until  one  with  value  less  than  d/2  is  found  or  until  all  are 
computed  and  the  smallest  selected. 

Fortunately,  it  is  not  necessary  to  search  exhaustively.  Algorithms  exist 
which  use  the  u>-distance  metric  to  augment  the  performance  of  any  binary 
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decoder,  thus  making  soft  decision  decoding  improvements  available  for  use 
with  a  wide  variety  of  block  codes.  Such  techniques  are  called  weighted 
erasure  decoding  (WED)  (23]  although,  in  the  strictest  sense,  nothing  is 
really  erased.  An  example  of  a  WED  algorithm  follows. 

Expand  each  tr-weight  on  a  set  of  r  <  2^1  positive  real  numbers  using 
binary  coefficients. 

r 

w,  =  Ai,r„  Ai,  6  {0, 1}  (19) 

1 

Express  each  received  symbol  in  this  representation,  and  write  each  re¬ 
ceived  word  as  the  resulting  array  of  r  binary  n-tuples.  (It  will  soon  become 
obvious  that  the  complexity  of  this  decoder  is  sensitive  to  the  value  of  r 
which,  therefore,  should  be  kept  as  small  as  possible.)  Using  your  favorite 
binary  decoder,  decode  each  n-tuple  in  this  array.  The  following  quantities 
are  of  interest: 

F,  =  the  number  of  positions  changed  in  the  sth  row  by  the 
binary  decoder 

E,  =  the  actual  number  of  errors  that  occured  in  the  sth  row 
E  =  Y.  er  =  £»=i  vjEj  =  total  u’-weight  of  the  error  pattern 
R,  =  moi(0,2d  -  2F,) 

Iii  any  column  define  So  and  Si  to  be  the  index  sets  of  positions  where 
the  value  is  0  and  1  respectively.  Then,  decide  that  an  information  digit  has 
value  0  if 

Y  RfvJ  >  Y  Rsvs  (20) 

S0  Sj 

and  1  otherwise.-*' 

NOTE:  This  algorithm  does  not  attempt  to  output  a  codeword  from  the 
decoder  but  rather  to  produce  a  set  of  “best”  estimates  of  the  information 
digits.  It  is,  therefore,  best  suited  to  applications  that  are  bit  oriented  rather 
than  block  or  word  oriented. 

For  channels  having  continuous  outputs,  Reddy  [24]  extended  WED  to 
two  cases  which  cover  many  important  applications. 

Reddy’s  Type  I  channel  is  a  binary  input,  channel  with  continuous  out¬ 
put  on  [0, 1],  The  Generalized  Hamming  Weight  (GHW)  of  a  channel  output 
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is  defined  as  GHW(ei)  =|  e,  |.  The  GHW  of  a  received  vector  is,  as  before, 
the  sum  of  the  GHW’s  of  its  components.  Reddy  [24]  proved: 


Theorem  6  On  a  Type  I  channel,  all  error  patterns  having  GHW  <  |  are 
correctable. 


It  is  instructive  to  outline  Reddy’s  algorithm  and  to  compare  it  with 
Yu's  restriction  of  GMD  to  Q-ary  channels.  (See  the  next  section.) 

Reddy  allows,  the  number  of  upweights  to  be  infinite,  and  represents 
every  value  x:  6  [0, 1]  by  an  expansion  of  the  tn-weights,  x}  =  £St  frtyW, 
where  the  bXJ  are  binary  coefficients.  Define: 


Vi 

=  (bn,bi2.  ■  ■  •  ,bi„) 

(21) 

LID 

=  [log2{d+l)\ 

(22) 

Wi 

=  2"'. 

(23) 

The  algorithm  is: 

•  From  the  received  word,  derive  l\,  V2,  •  •  •  VL2D- 

•  Using  the  binary  decoder,  decode  V’i,  V2, •  •  •,  Vl2D- 

9  Select  V,  such  that  GH D(\\,Y)  <  5. 

If  d  is  odd,  this  will  correct  all  error  patterns  having  GH D  < 

Reddy’s  Type  II  channel  takes  input  from  arbitrary  alphabet  A  and 
presents  as  output  pairs  (y,  a)  where  y  6  A  and  a  6  [0,1].  Defining  GHD 
between  (yi,Qi)  and  (y2,e*2)  as  ^  |  qi  -  02  [,  he  showed  that  all  error  pat¬ 
terns  having  GHD  <  |  are  correctable.  Note  that  the  obvious  application 
of  this  case  is  to  the  soft  decision  demodulator  output  which  presents  both 
a  tentative  hard  decision  estimate  and  a  confidence  value  for  each  demodu¬ 
lated  symbol. 
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4.4 


Improved  GMD  Algorithms  for  Q-ary  Chan¬ 
nels 


Yu  and  Costello  [18]  extended  GMD  (which  was  developed  for  the  ar¬ 
bitrary  channel)  to  the  Q-ary  output  channel.  Relaxing  the  constraint  on 
Theorem  4  by  dividing  each  of  the  components  of  r  by  the  absolute  value  of 
the  largest  and  making  arguments  on  the  Q-ary  channel,  produces  a  received 
vector  3  =  (3\,  ■  ■  ■  , /?n),  -1  <  3i  <  1<  that  satisfies  the  theorem: 


Theorem  7  For  any  received  3,  there  is  at  most  one  codeword  c  such  that 

3  •  c  >  n  -  d.  (24) 


That  this  is  stronger  than  Theorem  4  can  be  seen  by  noting  that,  taken 
together,  the  two  theorems  imply  that 

r-c  >|  rM  |(n-d)  (25) 

where  |  r.\/  |<  1  is  the  magnitude  of  the  largest  component  of  r.  This  result 
will  be  used  in  the  next  algorithm. 

Now,  the  continuous  value  of  each  channel  output  symbol  is  quantized 
to  the  nearest  level  in  the  set  { 2(h  -  1)/(Q  -  1)  -  1, h  =  1,2,  •  •  •,$}•  The 
channel  output  vector  is  thus  replaced  by  its  quantized  version  7  which  obeys 
the  following  familiar  looking  theorem. 


Theorem  8  For  any  quantized  received  word  7,  There  exists  at  most  one 
codeword c  such  that 

7  c>n-d.  (26) 

When  Q  can  be  written  as  ql,  the  algorithm  for  finding  such  a  codeword  is 
quite  similar  to  that  of  Reddy  [24].  The  quantized  received  vector  7  gener¬ 
ates  l  g-ary  words,  each  of  which  is  decoded  by  a  9-ary  decoder.  Together, 
these  produce  a  codeword,  zr  If  c;  satisfies  Theorem  8,  output  it  as  the 
decoded  word  and  stop.  Otherwise,  try  another  of  the  g-ary  outputs  until  all 
are  exhausted.  If  no  codeword  satisfies  the  theorem  for  the  7  at  hand,  the 
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one  with  the  largest  value  of  the  dot  product  is  output  as  the  best  estimate 
because  it  is  the  “closest”  to  the  received  vector  in  the  sense  of  Theorem  8. 
However,  it  has  been  shown  that,  if  Theorem  8  is  satisfied,  then  at  least  one 
of  the  q-ary  outputs  satisfies  a  similar  condition. 


4.5  Rank  decoding 


Finally,  we  present  a  decoding  technique  that  is  in  rather  a  different  vein 
from  those  preceeding,  all  of  which  are  related  in  some  way  to  GMD.  It 
is  included  because,  for  its  simplicity,  it  provides  a  useful  amount  of  error 
correction  on  noisy  channels. 

Chase  [13]  developed  rank  decoding  to  be  used  with  the  product  of  two 
(3,2)  parity  check  codes.  In  principle,  it  can  be  applied  [25]  to  any  binary 
code  that  is  one-step  orthogonalizable3  [26].  The  received  symbols  within  a 
word  are  ordered  according  the  reliability  measure  accompanying  the  hard 
decision  estimate.  Those  estimates  which  satisfy  all  checks  or  which  satisfy 
one  check  and  are  sufficiently  reliable  are  accepted.  Others  are  flagged. 
When  a  row  or  column  in  the  product  array  contains  only  one  undecoded 
position,  that  position  is  decoded  with  a  value  that  will  force  the  parity 
of  the  row  or  column  to  check,  as  in  Wagner  decoding.  Rank  decoding 
gives  coding  gains  of  2  to  4  dB  for  the  codes  originally  used  by  Chase. 
Coding  gain  decreases  as  longer  single  parity  check  equations  are  used;  this 
is  expected  since  the  rate  increases  and  the  ratio  of  minimum  distance  to 
length  decreases  with  block  length  [25]. 


3A  code  is  orthogonalizable  if,  for  every  information  position,  a  set  of  d  —  1  check 
equations  involving  that  position  can  be  written  so  that  no  other  position  appears  in  more 
than  one. 
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Chapter  5 

Optimal  Methods 


Perhaps  less  practical  than  techniques  of  the  preceeding  section,  so-called 
optimal  methods  demonstrate  the  limits  of  what  is  possible  and  give  a  mea¬ 
sure  of  the  difficulty  in  achieving  these  limits. 

Several  criteria  of  optimality  can  be  defined  for  decoders.  These  include: 


minimum  error  probability, 
maximum-likelihood,  and 
minimum  cost. 


We  shall  not  consider  minimum  cost  because,  in  a  communications  con¬ 
text.  all  types  of  errors  are  generally  of  equal  cost. 

The  minimum  error  probability  criterion  is  to  decode  the  received  vector 
r  into  the  codeword  cm  which  is  most  likely  to  have  been  transmitted,  given 
that  r  was  received.  That  is,  choose  c^,  so  that 

Pr{ cm-  |  r)  >  Pr{ cm  |  r),  m  ^  m'.  (27) 


The  maximum  likelihood  criterion  is  to  decode  r  into  the  codeword  cm<  for 
which  r  is  most  likely  to  have  been  received,  given  that  cm>  was  transmitted. 
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(28) 


Or,  choose  so  that 

Pr( r  j  cm.)  >  Fr(r  |  cm),  m  ^  m'. 

Baves1  rule  makes  (27)  equivalent  to 

Pr(cm,)Pr( r  |  cm-)  >  Pr(cm)Pr(r  |  cm).  (29) 

Hence,  when  codewords  are  equally  likely  to  be  transmitted,  Pr(cm«)  = 
Pr(cm)  and  we  have  [27]: 

Lemma  1  If  all  codewords  are  equally  likely  to  be  transmitted,  then  any 
maximum-likelihood  decoder  performs  minimum  error  probability  decoding. 

The  decoding  error  can  be  minimized  over  a  word  or  over  each  symbol 
within  a  word. 


5.1  Optimal  Word  Decoding 


Optimal  word  decoding  will  mean  maximum  likelihood  or  minimum  proba¬ 
bility  of  error  decoding  since  our  concern  is  with  equiprobable  code  words. 
Hwang  [28]  has  presented  an  algorithm  which,  like  correlation  decoding,  is 
guaranteed  to  be  optimal  and  which  presents  avenues  for  reducing  its  com¬ 
plexity  by  exploiting  the  algebraic  structure  of  the  linear  block  code. 


Unlike  most  of  the  soft  decision  decoders  heretofore  presented,  this  one 
has  no  associated  binary  decoder.  In  the  spirit  of  reducing  complexity,  it 
does  not  require  multiplication  of  real  numbers. 

The  received  vector  is  r  =  c  +  e  where  c  is  the  {±1}  representation  of 
binary  code  vector  and  C  is  the  set  of  code  vectors  over  {±1}".  The  error 
vector  e  is  an  n-tuple  of  real  numbers.  The  bit  log  likelihood  ratios  are 


<t>i  =  In 


Pr(r,  1  1) 

/Mr,  1-1) 


1,2,' 


(30) 


and  are  the  components  of  the  so-called  channel  measurement  information 
vector  of  r:  <t>  =  (d>i  ,<h-  *  *  ■  ,d>„). 
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The  following  definition  is  needed. 

For  n-tuples  a  and  b, 

a  x  b  =  (a161,a262,-”!ani>n)  (31) 

and  is  a  vector. 

If  C*  is  the  binary  code,  define  a  subset  C\  of  code  vectors  thusly. 

k  n 

Q  =  £  c’aj)  (32) 

•=i  j=<:+ 1 

That  is.  CA  is  the  set  of  codewords  in  C"  which  have  more  l’s  in  the  last 
n  -  k  positions  than  in  the  first  k  positions. 

Hwang’s  algorithm,  which  follows,  exemplifies  the  types  of  computation 
needed  to  do  maximum  likelihood  information  and  provides  a  vehicle  for 
discussing  the  complexity  of  such  decoders.  An  algorithm  for  finding  cm, 
the  maximum  likelihood  codeword,  is: 

•  Calculate  <*>.  Find  Cj  such  that  <j>,  •  Cj  >  0,  i  =  1,2.  •••.k.  Set 
x  =  <p  x  c,. 

•  Check  values  of  c*  ■  x  for  all  c*  G  CmA. 

•  If  c‘m  •  x  <  0  and  c*m  minimizes  c*  •  x  then  set  cm  =  C]  x  c0m.  Else 
set  cm  =  ci. 

Note  how  the  complexity  of  the  algorithm  depends  directly  on  the  size  of 
ca- 

5.2  Optimal  Symbol  Decoding 


To  minimize  the  probability  of  a  symbol  error,  select  the  symbol  cm  =  a, 
to  maximize 

Pr(cm  =  a,|r)=  £  Pr(c|r)  (33) 

C €Srn(ai ) 
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where  a,  £  the  source  alphabet  and  5m(a<)  is  the  set  of  all  codewords  where 
the  m-th  symbol  has  value  a,-.  That  is,  select  that  value  of  the  m-th  codeword 
symbol  that  minimizes  the  probability  of  symbol  error  conditioned  on  the 
received  vector. 

An  implementation  of  the  minimum  probability  of  symbol  error  for  the 
binary  symmetric  channel  and  for  any  linear  code  [29]  results  in  the  decoding 
rule: 


Set  cm  =  0  if 


V  ff  (^0 

h  r=U1  +  ^ 


and  cm  =  1  otherwise.  c';  is  the  l  -  th  component  of  the  j  -  th  member  of 
the  dual  code  C' ,  6mi  is  the  Kronecker  delta,  ®  represents  addition  modulo 
2,  and  0/  is  the  bit  likelihood  ratio  given  by 

±  _  Pr{rm  |  1)  /oc\ 

*  -  ftoTTo)'  (35) 

In  general,  a  decoders  for  such  symbol  decoding  rules  will  produce  an  output 
vector  c  which  may  not  be  a  codeword.  To  extract  the  actual  information 
symbols  (which  is,  after  all,  the  point  of  the  algorithm)  when  the  code  is 
not  systematic,  a  set  of  k  linear  equations  must  be  solved  after  decoding.  1 

To  select  the  most  likely  transmitted  m-th  symbol ,  select  cm  =  a,  which 
maximizes 

C  6  $m  ( a  t ) 

Bit  by  bit  algorithms  are  quite  complex  and  will  not  be  considered  further 
in  this  work. 


]ln  a  systematic  code,  the  k  encoded  information  positions  appear  unaltered,  usually 
in  the  first  or  last  k  positions  of  the  codeword.  In  a  non-systematic  code,  every  one  of 
the  n  encoded  symbols  is  a  linear  combination  of  the  k  information  symbols.  That  is, 
c:  =  E,k=o’  •  =  0, 1,  • ,  n  -  1. 


Chapter  6 

Conclusions  and  Directions 


The  purpose  of  this  overview  is  to  consider  candidate  soft  decision  tech¬ 
niques  for  decoding  IIC’s.  In  the  original  IIC  decoding  algorithm,  the  first 
decoding  of  the  row  code  from  the  channel  reduces  the  error  probability  from 
ptopi.  Decoding  the  first  column  reduces  p\  to  e,  an  arbitrarily  small  value, 
thus  giving  estimates  for  the  first  position  in  each  row  with  very  high  confi¬ 
dence.  After  the  decoded  first  position  is  subtracted  from  the  row  code,  the 
entire  process  is  repeated,  now  possibly  requiring  a  shorter  and  higher  rate 
code  for  the  second  column.  These  steps  are  iterated  until  each  information 
position  in  the  row  code,  and  hence  in  the  entire  code,  is  decoded. 

Soft  decision  techniques  promise  improvements  to  this  process,  fir: 

1.  smaller  values  of  pi,p2>-->  thus  requiring  shorter  and  higher  rate 
column  codes, 

2.  the  creation  of  erasures  in  the  leftmost  column  to  be  reconstructed  by 
errors  and  erasures  decoding  of  the  column  code  with  a  possible  improvement 
in  code  rate,  and 

3.  a  redesign  of  Improved  Iterated  Codes  using  codes  more  easily  decoded 
than  BCH  codes,  counting  on  the  stronger  soft  decision  decoding  to  provide 
comparable  or  even  improved  performance. 


% 


« 
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Soft  decision  decoders  which  should  be  examined  for  use  in  a  decoder 
for  ITC  include: 

•  channel  measurement  decoders 

•  errors  and  erasures  decoding  with  GMD 

•  GMD  on  a  binary-input  discrete  memoryless  channel. 

Finally,  certain  techniques  have  been  tailored  specifically  for  iterated 
codes  and.  for  that  reason,  should  be  tried.  These  include 

•  a  form  of  WED  [23] 

•  an  extension  of  GMD  to  iterated  codes  [20] 

•  a  recent  scheme  bult  on  Elias’s  codes  [30]. 
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